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Abstract 


This  paper  provides  a  probabilistic  analysis  of  the  so-called  "strong" 
linear  programming  relaxation  of  the  k-median  problem.  The  analysis  is 
performed  under  four  classical  models  in  location  theory,  the  Euclidean, 


m 


network,  tree  and  uniform  cost  models.  For  example,  -we-  shown.that,  for  the 
Euclidean  model  and  log  n  4  k\;  n/(log  n)r,  the  value  of  the  relaxation  is 
almost  surely  within  .3  percent  of  the  optimum  k-median  value.  A  similar 

X+ 

analysis  is  performed  for  the  other  models.  We- also  showr^hat,  under  various 
assumptions,  branch  and  bound  algorithms  that  use  this  relaxation  as  a  bound 
must  almost  surely  expand  a  non-polynomial  number  of  nodes  to  solve  the  k- 
median  problem  to  optimality.  Finally,  ^we— repertr  extensive  computational 
experiments^  As  predicted  by  the  probabilistic  analysis,  the  relaxation  was 
not  as  tight  for  the  problem  instances  drawn  from  the  uniform  cost  model  as 
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1 .  Introduction 


The  k-median  problem  has  been  widely  studied  both  from  the  theoretical 
point  of  view  and  for  its  applications.  An  interesting  theoretical 
development  was  the  successful  probabilistic  analysis  of  several  heuristics 
for  this  problem  (e.g.  Fisher  and  Hochbaum  [8]  and  Papadimitriou  [22]).  On 
the  other  hand,  the  literature  on  the  k-median  problem  abounds  in  exact 
algorithms.  .Most  are  based  on  the  solution  of  a  certain  relaxation  to  be 
defined  later.  The  computational  experience  reported  in  the  literature  seems 
to  indicate  that  this  particular  relaxation  yields  impressively  tight  bounds 
compared  to  what  can  usually  be  expected  in  integer  programming.  In  this 
paper  we  analyze  to  what  extent  this  relaxation  is  tight.  We  perform  our 
analysis  under  various  probabilistic  assumptions  and  identify  conditions  under 
which  the  relaxation  can  be  expected  to  be  tight  and  others  under  which  it  can 
be  expected  to  give  a  poor  bound.  For  example,  fdr  a  classical  Euclidean 
model  in  the  plane,  we  show  that  the  relaxation  can  be  expected  to  provide  a 
bound  within  one  third  of  one  percent  of  the  optimum  value  of  the  k-median 
problem.  In  addition  to  the  probabilistic  analysis,  we  also  report  extensive 
computational  experiments,  based  on  the  solution  of  thousands  of  medium-size 
problems.  Some  of  the  results  predicted  for  very  large  problems  by  our 
probabilistic  analysis  can  already  be  observed  on  these  test  problems. 

Consider  a  set  X={X1 , . . . ,Xn}  of  n  points,  a  positive  integer  k<  n  and 

let  djj  >  0  be  the  distance  between  X^  and  Xj  for  each 

1  <  i  <  n  and  1  <  J  <  n.  (Unless  otherwise  specified,  it  is  assumed  that  d^ 

=  0,  dy  =  d  j  ^  and  d^  <  d^k  +  ,  for  all  i,J,k).  The  k-median  problem 

consists  of  finding  a  set  S  c  X,  |S|  =  k,  that  minimizes  £  min  d.  .  (Here 

i=1  JeS  1J 

| S |  denotes  the  cardinality  of  the  set  S.)  The  k-median  problem  has  the 
following  integer  programming  formulation. 
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(1) 

ZIP= min  l,  Vij 

n 

(2) 

I  y, ,  =  1  for  i=1,...,n 

J=1  J 

n 

(3) 

I  x  =  k 

j=1  J 

(4) 

0  1  ?ij  £  xj  5  1  for  A»J  = 

(5) 

Xje  {0, 1 }  for  j  =  1 , . . . ,n. 

In  this  formulation  Xj  =  1  if  Xj  eS,  0  otherwise  and,  for  1  <  i  <  n,  we  can 


set  y,,  =  1  for  an  index  j  that  achieves  min  d... 
J  JeS  1J 


The  formulation  (1)-(4)  is  called  the  linear  programming  (LP)  relaxation 

of  the  k-median  problem.  In  other  words,  the  LP  relaxation  is  obtained  by 

ignoring  the  integrality  conditions  on  Xj,  1  <  j  <  n.  The  optimum  value  zLp 

of  this  relaxation  clearly,  satisfies  z^p  <  zlp.  The  bound  zLp  has  been  used 

extensively  in  exact  algorithms  for  the  k-median  problem.  (E.g.  Marsten  [15], 

Garfinkel  Neebe  and  Rao  [10],  ReVelle  and  Swain  [23],  Diehr[5],  Shrage[24], 

Guignard  and  Spielberg[ 1 1 ] ,  Narula,  Ogbu  and  Samuelsson[20] ,  Cornuejols, 

Fisher  and  Nemhauser  [3],  Erlenkotter  [6],  Galvao  [9],  Magnanti  and  Wong  [14], 

Nemhauser  and  Wolsey  [21],  Mulvey  and  Crowder  [19],  Mavrides  [  1 6 ] , 

Mirchandani,  Oudjit  and  Wong  [17],  Christofides  and  Beasley  [2],  Beasley[1].) 

Most  of  the  computational  experience  has  been  reported  on  test  problems 

with  n<  100.  For  many  of  these  test  problems,  zJp  =  ZLp.  Recently, 

Beasley  [1]  solved  forty  larger  problems  (with  100  <  n  <  900)  and  found  a 

ZIP  ”  ZLP 

small  but  positive  gap  zIp-zLp  for  many  of  them.  The  average  of  - - - 

over  these  problems  was  .0024. 


zjp“ZLP 

In  this  paper  we  analyze  the  ratio  -  from  a  probabilistic  point 

ZIP 


of  view  as  n  goes  to  infinity,  under  various  assumptions  on  the  probability 
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distribution  of  problem  instances.  We  do  not  address  the  worst-case  analysis 
of  this  ratio  except  to  note  that  this  question  was  solved  by  Cornuejols, 


Fisher  and  Nemhauser  [3]  when  d^  <  0.  The  analysis  of  [3]  does  not  carry 

over  when  the  djj's  are  nonnegative  and  satisfy  the  distance  axioms.  In  fact, 

this  worst-case  analysis  is  an  interesting  open  question.  It  would  also  be 

ZIP~ZLP 

interesting  to  know  the  worst-case  value  of  -  when  the  dn's  are 

further  restricted  to  represent  Euclidean  distances.  Once  again,  these 
questions  are  not  addressed  here  as  we  focus  on  a  probabilistic  approach. 

We  will  often  write  statements  like  Xr  <  un  almost  surely  (a.s.)  for  a 
sequence  of  random  variables  (Xn)  and  real  sequence  (un).  This  is  a  well- 
defined  terminology  of  probability  theory  and  details  can  be  found  in  Stout 
[25]  for  example.  We  will  invariably  prove  that 


J  Pr( X  >  u  )  <  «  ' 

nil  n  n 


which  implies  the  above  statement.  Non-probabilists  will  be  satisfied  that  we 

show  Pr(X  >  u  )  -*•  0  as  n  -*  ®.  If  X_  <  u_(1+o(1))  a.s.  and  X„  > 

n  n  n  n  n 

un(1-o(1))  a.s.  then  we  write  Xr  -  ur  a.s. 

First  we  study  the  k-median  problem  in  the  plane.  When  the  points 

Xp...,Xn  are  uniformly  distributed  in  a  unit  square  and  d^  is  the  Euclidean 

ZIP"ZLP 

distance  between  X*  and  X.,  1  <  i,j  <  n,  we  show  that  -  -.00284 

J  -  zip 

where  w  =  u>(n)  ♦  ®.  (In 


n 


almost  surely,  for  any  k  such  that  u  <  k  <  ^ 
this  paper  we  abbreviate  f(n)  +  a  as  n  +  ®  by  f(n)  ♦  a.) 

In  a  second  model,  the  points  X1f...,X  are  the  nodes  of  a  random  graph 
Gn(p)  where  p  is  the  probability  that  an  edge  is  in  the  graph,  and  dy  is  the 
number  of  edges  on  the  shortest  path  from  Xi  to  Xj.  We  assume  p  >  M  n 
where  u  =  ui(n)  ♦  ®  (this  guarantees  that  Gn(p)  is  almost  surely  connected), 


Z  -  2 

and  kp^>  -  n.  We  prove  that  — — - —  <  — almost  surely,  where  e  is 

n  z  j  p  6^  i 

the  base  of  natural  logarithms.  More  specifically,  if  logbn  <  k  <  n  where 
b=-~,  then  zIp  =  zLp  almost  surely.  If  2  <  k  <  logbn,  kp-*-a  where  0  <a  <® 
and  p*fl  where  0<S<1,  define  a  =  e  if  6  =  0  and  (1-6) 1-6  if  6  >  0;  then 

^  ^  P/  n  a  \  1  mrtof  oiiftAlif  uhaita  P/  «  Q  \  —  J (  ^  ^  ) — l  TUa  mavi  mi  lm  aP 


-  f(a,6)  almost  surely  where  f(o,S)  = 


(The  maximum  of 


this  function  is 


attained  when  a=1  and  6=0.  When  a=0  or  ®  the 


function  takes  the  value  0). 


We  also  analyze  the  k-median  problem  on  random  trees  and  on  another  model 
where  it  is  assumed  that  the  d^j's  are  independently  and  uniformly  distributed 
on  [0,1]. 

In  section  6,  we  put  our  probabilistic  results  in  perspective  by 

presenting  extensive  computational  experiments. 

In  section  7,  we  show  how  our  results  for  the  k-median  problem  relate  to 

the  simple  plant  location  problem  (SPLP).  In  the  SPLP,  the  data  -comprise,  n 

points  X^,  ...,  Xn,  distances  djj  for  1  <  i,  j  <  n,  and  fixed  costs  fj 

associated  with  each  point  X.,  1  <  j  <  n.  The  SPLP  consists  of  finding  a 

J  n~ 

nonempty  set  S  c  X  that  minimizes  £  min  d.  +  \  f.  .  (Note  that,  in 

i=1  jeS  1J  JeS  J 

this  problem,  |S|  is  not  restricted  as  in  the  k-median  problem.)  An  integer 
programming  formulation  of  SPLP  is 


n  n 


:ip 1  ml\l  J,Vu  *  j,  Vj 


subject  to  (2),  (4)  and  (5).  The  LP  relaxation  is  obtained  by  relaxing  the 
integrality  conditions  (5). 

In  the  remainder  of  this  section  we  state  some  useful  results  from  the 
literature.  Our  proofs  use  the  following  lemma  (see  Hoeffding[ 12 ] ) . 

Lemma  1 .  If  Y^,..,Yn  are  independent  random  variables  and  0  <  Y^  1  for 


Proof.  The  program  zLp(x)  separates  for  each  j  into  a  linear  program  with 
upper  bounded  variables  and  a  single  constraint. 


n 

Let  dt  =  £  di  |y i j  where  the  values  of  are  those  defined  in  Lemma 

^  ^  n 

2.  Note  that  zr  D  <  J  d.  since  this  bound  is  derived  from  a  primal  feasible 

i=1  1 

solution.  This  bound  will  be  used  repeatedly  in  our  proofs  where  it  is 
computed  for  the  vector  x  defined  by  Xj  =  k/n  for  j=1,...,n. 

The  dual  of  the  LP  relaxation  is 


z^p  =  max 


n  n 

y  u4  -  i 
ui  1  j=i  j 


ui  -  t±j  <  d.j  for  all  ij 


£  t. .  -  v  -  w  <  0  for  all  j 
i=1  1J  J 


ti j i Vj  >  0  for  all  i, j. 


For  any  given  vector  u  =  (u^ : i= 1 , . . . ,n) ,  define 
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ii 

o  (u)  =  51  (u  -  d  )+  for  J  =  1 , . . . ,n, 

J  i=1  1  1J 


n 

where  a+  denotes  max(0,a).  Let  zD(u)  =  £  u.  -  k  max  p  (u). 

i=1  1  J=1,...,n  J 


Lemma  3.  z^p  >  zD(u)  for  any  vector  u. 


Proof :  It  can  be  checked  that,  for  any  given  u,  a  feasible  solution  of  (6)  is 
obtained  by  setting  tt 1  =  (u,  -  dj,)*,  v*  =  0  and  w  =  max  p.(u). 

■Lj  j  j2 1 , . . .  ,n  j 


2.  The  Euclidean  model  in  the  plane. 

This  section  is  concerned  with  the  following  Euclidean  model:  n  points 
X .j , . . . , XR  are  chosen  independently  and  uniformly  ,  at  random  in  the  unit  square 
Sq  =  [0,1]2.  The  distance  matrix  is  given  by  djj  =.  1^  -  X j  |  |  for. 
1  <  i,j  <  n  where  ||*|l  denotes  the  Euclidean  norm.  We  assume  that 

(7)  k  ♦  «  and  n/(klogn)  ♦  ®. 

The  following  theorem  was  proved  by  Papadimitriou  [22], 

Theorem  1  Under  the  above  conditions, 


zIp  ~  (.3771967...)  n/  /k  a.s. 


This  result  was  obtained  by  comparing  Zjp  to  the  value  z^  of  finding  k 
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points  in  X  =  {Xp...,X  }  that  minimize  the  sum  of  the  distances  to  a 
continuum  of  points  in  the  unit  square.  Papadimitriou  showed  that,  when  (7) 
holds,  Zjp  -  Zq  almost  surely.  Actually,  he  used  a  weaker  notion  of 
probabilistic  convergence,  but  Zemel  [26]  showed  that  almost  sure  convergence 
holds  as  well.  It  should  be  pointed  out,  however,  that  the  continuous  problem 
yielding  zc  is  very  different  from  the  LP  relaxation.  In  fact,  for  the  LP 
relaxation,  we  prove 


Theorem  2  Under  the  above  conditions, 


z^p  -  -  n/  A  a.s. 


where  2/(3  /^)  =  .3761264.... 


Our  method  of  proof  consists  of  conjecturing  a  near-optimal  solution  to 
the  LP  relaxation  and  a  near-optimal  solution  to  its  dual.  Then  we  show  that, 
almost  surely,  these  lower  and  upper  bounds  on  z^p  are  the  same,  up  to  small 
order  terms.  The  probabilistic  arguments  are  based  on  the  estimates  of  the 
tails  of  the  binomial  distribution  given  in  Lemma  1. 

The  proof  of  Theorem  2  will  actually  provide  a  constructive  way  of 
obtaining  an  upper  bound  z^p(x)  and  a  lower  bound  Zp(u)  on  the  optimum  value 
of  the  LP  relaxation  of  the  k-median  problem. 


Corollary  1 .  Let  Xj  =  k/n  for  j  =  1,...,n  and  u^  =  A/tt  for  i  =  1,...,n.  Then 


zD(u)  <  zLp  <  zLp(x)  and,  under  condition  (7), 


zn(u)  -  z. 


almost  surely, 


m 


zLp(x)  -  zLp 


almost  surely. 


In  addition,  in  [22],  Papadimitriou  gives  a  heuristic  which  almost  surely 
provides  a  solution  with  value  z^  -  Zjp.  The  complexity  of  the  heuristic  is 
O(nlogn).  Combining  this  result  with  the  fact  that  Zq(u)  can  be  computed  in 
linear  time,  we  have  a  very  fast  procedure  which  will  almost  surely 
(i)  find  a  solution  with  a  value  close  to  the  optimum, 

(ii)  prove  that  the  value  of  this  solution  is  within  .3%  of  the  optimum. 

Finding  the  exact  optimum  is  much  more  expensive  as  will  be  shown  in 
Theorem  3.  But  first  we  give  the  proof  of  Theorem  2. 

Proof  of  Theorem  2.  To  obtain  a  probabilistic  upper  bound  on  z^p,  we  are 
first  going  to  consider  the  LP  solution 


x .  =  k/n  for  j  =  1 , . . .  ,n 


U 

and  the  values  of  y..  .  as  defined  in  Lemma  2.  Let  d.  =  Y  d.  .y.  . 

i  i/ij 


for 


i=1,...,n.  We  must  get  a  probabilistic  estimate  of  d-  for  i  =  1,...,n.  Let 

1/3  1  1/2 

e  =  ( — §“-)  ,  r  =  (^ ;j  ^y)  and  let  Sp  be  the  square  [r,1-r]2.  We  show 


first 


(8) 


Pr(d.  >  -2-  (1+o(l))  |  X,  e  sj  <  2e 
1  3  /kit 


.si n 

9k 


(9) 


Pr(d :  >  (1+o(1))  |  X,  i  S_)  <  2e‘  ^ 

3  /tor  1  ‘ 


If  X.  e  S  ,  then  a  circle  C,  of  radius  r  centered  at  X,  is  entirely 

1  P  1  1  J 

q.  The  number  N  of  points  lying  in  this  circle  stochastically 


\»  '  *  »  V.  '  .*■  .  ■  .  '  .  '  .*  •  A  . 


contained  in  S( 


dominates  the  binomial  B(n,  irrc)  (since  Xi  e  C^.  We  define  independent 
random  variables  Wj,  J=1,2,...,n  as  follows: 


w  -[diJ  1 

Wj  l  0  o 


if  Xj  e  C. 
otherwise  . 


note  that  E(WJ  =  2itr3/3  (j  *  i).  If  N  >  r£l  then  d.  <  -  ?  W.. 

J  K  1  n  j=i  J 


by  Lemma  1 , 


_  -§  nirr2 

Pr(N  <  rgl )  =  pr(N  <  (l-ejnnr*)  <  e  * 


Furthermore,  if  W^  =  W^/r  e  [0,1],  then  by  Lemma  1, 

2  2 

n  .  02-|  (n-1)^- 

Pr(  l  W  >  ( 1+e)(n-1 )  ^-)  <  e  3  3 

j  =  1  J  3 


and  (8)  follows. 

To  prove  (9),  we  note  that  if  Xi  e  SQ  -  Sp,  we  can  at  worst  find  a 
quadrant  of  a  circle  centered  at  Xi  with  radius  2r  and  contained  entirely 

p 

within  Sq.  The  area  of  this  quadrant  is  ir(2r)  /H  and  we  apply  the  same 
method  as  above  with  E(W)  =  4irr3/3. 

We  are  now  ready  to  bound  z^p. 


ZLP  * 


it 

l  «*, 


l  «  *  l  ir 

X .  eS  1  X .  eS  -S  1 
l  r  l  0  r 


By  Lemma  1 , 


Pr{|XnSr|  <  n(1-2rr(1-e)}  <  e 


n(1-2rr 


and  thus 


2t 


Pr{zLp  >  (1+o(1))((1-2r)2n  +  ( t-(  1-2r)2)n-^z)}  <  (2n+1)  e  9 


■n/k 


3Aii 


3  Air 


giving 

(10)  z  <  (1+o(D)  — 2—  almost  surely. 

L v  3  Air 

To  obtain  a  probabilistic  lower  bound  on  zLp,  we  consider  the  dual 
problem  (6).  Let  Uj  =  r  for  i=1...n.  Then  by  Lemma  3 

n  n 

(11)  z  >  l  u.  -  k  nJax  (  J  i>+) 

i=1  1  j  i=1  1  AJ 

For  fixed  j,  consider  random  variables  =  (ui-dij)+. 

Setting  u^  =  r  we  find  E(U^)  =  ^ —  for  i  *  j  and  X^e  Sr,  whereas  these 
values  decrease  for  points  Xj  e  SQ-Sr.  Rescaling  U  to  [0,1]  and  applying 
Lemma  1  to  Xj  e  Sr  we  find 

2 
e 


and  thus  for  k  =  o(r^ — )  we  have 

logn 

n  3 

Max  (  l  U.)  <  (1+e)2^-  a.s. 
j  i=1  1  3 


n 

Pr(  l 
i=1 


U,  >  ( 1+e) 


nirr 


giving 
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a 


Z  >  nr  -  ( 1+e)knur3/3  =  (1-o(1))  a.s. 


Combining  this  with  (10)  yields  the  theorem. 


One  might  expect  then  that  an  LP-based  branch  and  bound  procedure 
performs  well,  since  z^p  provides  a  good  bound.  However,  we  can  prove 


Theorem  3.  Assume  k/logn  -►  ®  and  n/k  logn  -»■  ®. 

Then  there  exists  a  constant  a  >  0  such  that  a  branch  and  bound  procedure 
that  branches  by  fixing  a  variable  Xj  to  0  or  1  at  each  node  of  the  search 
tree  which  is  not  pruned  and  uses  the  LP  bound  to  prune  the  search  tree  will 
almost  surely  explore  at  least  nok  nodes. 


Proof:  Each  node  of  the  branch  and  bound  tree  is  associated  with  two  sets  JQ 
and  where  Jfc  =  (j:  x^  is  fixed  at  t  in  the  associated  subproblem}  for 
t=0, 1 .  Let  z^p(Jo»^l^  denote  the  LP  bound  computed  at  this  node,  i.e.  the 
value  of  z^p  when  we  make  the  restriction  Xj  =  t  for  jeJt,  t=0,1.  Me  prove 
the  theorem  by  showing  that  for  some  constants  S,y  >  0  (to  be  determined)  the 
following  holds  almost  surely: 


For  any  c  {1,...»n>  such  that 


JQ  n  J1  =  0,  | JQ |  <  sn/klogn,  jjJ  <  yk,  we  have 


£  -3769  jr 


rJvirvKimrniMrjtiK’i  wjuin  a  v.  w  jv*jv  .mnAW  tononCTOTST  i.T»v^  ■>  m  .v.iavt.MVA\-.  w,\-  ■■-  ■■  - 


> 

£ 

k> 


l" 


I 

s 
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Pr(zLp(J0,J1)  >  -^-(1  ♦  3e ) )  <  (2n+1 )  e 

3^i 

Since  |F|  <  n0n/klo«n  *  Yk  we  find 


2& 

9k 


Pr(3(J0,J1)eF:  z^Jg,^)  >  (1  +  3*>)  * 

3/irk 


2e2n 


(2n+1)n8n/klo*n  +  *k  e  9k 


Taking  6  =  e2/5,  y  =  c  and  e  sufficiently  small  that  ?^1+3.e)  .  <  .3769 

3/it(  1-e) 

yields 


max  {zLp(J-,Jj):  (Jq,^)  e  F}  <  .3769  —  almost  surely. 

A 


Any  a  <  y  can  be  used  to  give  the  theorem. 


I 

R 


3.  A  Graphical  Model 

This  section  is  concerned  with  the  following  graphical  model.  Let  G  be  a 
random  graph  with  n  nodes,  where  each  edge  occurs  independently  with 

probability  p.  Let  Xp...,Xn  be  the  nodes  of  the  graph  and  d^j  the  minimum 

number  of  edges  on  a  path  joining  Xi  to  Xj  for  1  <  i,J  <  n,  where  the 

minimum  is  taken  over  all  paths  Joining  to  Xj.  Thus  djj  is  the  shortest 

distance  between  X^  and  X j ,  assuming  that  all  edges  have  length  one. 

Let  q=1-p  and  b=1/q.  The  main  result  of  this  section  is  the  following 


theorem. 


Theorem  4 


(a)  Consider  (1+e)  logbn  <  k  <  n,  where  e  >  0  is  fixed. 

(i)  If  p  ♦  «•  for  all  6  >  0  fixed,  then  Zjp  =  Z^p  almost  surely. 

(ii)  In  general,  we  only  have  lim  Pr(zT_  =  z._)  =  1 

n-*-®  IP  LP 

(b)  Consider  2  <  k  <  log  n  and  p  min(1,kp)  >  — °&n,  where  w-*-®.  Then 

ZIP  ”  ZLP  1 

- - -  i  t—  almost  surely.  (Note  that  the  condition  in  a(i)  is 

Zjp  I  +6 

satisfied.)  In  addition,  if  we  let  kp  -►  a,  0  <  a  <  ®,  and 
p-8,  0  <  8  <  1,  where  a  and  8  are  fixed,  then 


ZIP  ~  ZLP  .  1  -  (l-cOV* 


almost  surely, 


where  a  =  e  if  6=0  and.  B  if  8  >  0. 


1 1  rtf1 


--  &N*  hp 


ZIP~ZLP 

Figure  1.  — - -  as  a  function  of  kp  when  2  <  k  <  log.n. 

z  I P  b 


Proof  of  Theorem  41a] 


(i)  This  part  of  the  theorem  is  a  careful  phrasing  of  a  known  result  and  is 
easy  to  prove.  As  d^  >  1  for  i*j,  we  must  have 


ZIP  2  ZLP  2  n-k' 


(i)  follows  from  (15)  if  we  can  show  that 


Zjp  =  n-k  almost  surely. 

But  Zjp  =  n-k  if  and  only  if  there  is  a  set  K  c  X,  | K j  =  k,  such  that, 

for  any  Xj  e  X-K,  there  exists  Xj  e  K  such  that  Xj  and  Xj  are  joined  by  an 

edge  of  Gn(p),  i.e.,  K  is  a  dominating  set. 

Let  m  =  T  2/e  1  and  Kj  =  •  •  •  >  xil{)  for  i=1,2,...m.  If  none 

of  Kj ,K2, . . . jKjh  are  dominating  then  one  of  the  following  events  occurs: 

Eg  =  {  3  1  <  r*s  <  m  and  Xj  e  Kr  such  that  Xj  is  not  adjacent  in  Gn(p)  to 

any  vertex  of  K  1 
s 

Ej  =  .n.F.  where  F.  =  (3x.  e  X  -  u.  K,  such  that  X;  is  not  adjacent 

1  1=1  l  l  l  j=i  j  i  J 

in  Gn(p)  to  any  vertex  of  K ^ } 

Mow 


as  logbn  < 

by  assumption. 

Furthermore, 

m 

=  n  Pr(Fj) 

i  =  1  A 

<  ((n-km)(1-p)k)m 

<  n”em 


Pr(Eg)  <m2k(1-p)k 

<  m2kn-^1+E) 

<  m^lQgn 
‘  n1+enp 

=  0(n-<1+e/2)) 


Pr(E1) 


since  the  Fj  are  independent 
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; 


and  (i)  follows. 

(ii)  Pr(  K1  is  not  a  dominating  set) 

<  (n-k)( 1-p)k  <  n“e  +0.  □ 

Our  proof  of  Theorem  4(b)  will  use  the  next  two  lemmas. 

Lemma  4  Consider  1  <  k  <  logbn.  Assume  p  min(1,kp)  >  Ml°gn,  where 
W+®.  Then, 

Zjp  =  ( 1+o( 1 ) )(n-k)( 1+q  )  almost  surely. 

Proof :  For  K  c  X,  let  N(K)  be  the  neighbor  set  of  K,  i.e. 

N(K)  :  (Xj  e  X-K:  there  exists  an  edge  joining  Xj  to  a  node  of  K}. 

We  have 

z  >  min  (|N(K)|  +  2(n-k-|H(K) | ) ) 

|K|=k 

=  2(n-k)  -  max  |N(K) I . 

|K|=k 

We  prove  the  lemma  be  showing  that 


(16)  max  |N(K)|  =  ( 1+o( 1))(n-k)(  1-qk)  almost  surely,  and 

|K|=k 


(17)  Zjp  =  (1+o(1))  min  (|N(K)|  +  2(n-k-|N(K) I ))  almost  surely. 

|K|=k 


Consider  a  fixed  K  c  X,  |K|=k.  The  quantity  |N(K)|  is  distributed  as 


B(n-k,  1-q  ).  Thus,  by  Lemma  1,  for  any  small  e  >  0 


Pr[|N(K)|  <  ( 1-e)(n-k)( 1-qk) }  <  e-ie2(n-k)( 1-qk)  and 


>  *>  *4 

sJVJ 


VV’ 

/v 
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1  2, 


Pr[|N(K) 


-  Te  (n-k)( 1-q  ) 

|  >  ( 1+e)(n-k)( 1-qK) J  <  e 


Thus  we  have 


(18) 


Pr[  max  |N(K)|  <  ( 1-e)(n-k)(  1-qk)  ]  <  e~*e  <n-k>(1_<l  > 
|  K  |  =k 


1  2 


(19) 


Pr[  max  |N(K)|  >  ( 1+e)(n-k)( 1-qk) ]  <  (")e 
|  K  |  =k 


(n-k)( 1-q  ) 


k\  n£ 


To  obtain  (16)  we  put  e  =  2(klog^  /(n-k)(1-q  )) 


We  can  use 


C)  -  (  )k  i-n  (19).  Then  the  right  hand  sides  in  (18)  and  (19)  both  +  0 


lkJ  ~  v  k 

sufficiently  fast.  Thus  (16)  is  proved,  provided  that  e  <  1. 


We  consider  two  cases.  Let  0  <  a  <  1  be  a  constant. 

kP 


When 


kp  <  a,  qk  =  (1-p)k  =  [( 1-p)  1/P1  <  (  i  )kp  <  1-kp  h.  . 


So  ^  < 


klogn 


0  since  logn/np  0. 


since 


(n-k)kp(l  -  |  ) 

k  =  (1-p)k  <  e"kp  <  e_a  <  1. 
0  when  x  -*■  <=. 


When  kp  >  a,  q 
logx 


So  ~77 


>  i  n 

k  log  -r 


4  * 


(n-k)(1-e~  ) 


This  completes  the  proof  of  (16). 

To  prove  (17)  it  suffices  to  show  that,  almost  surely, 


(20)  every  node  in  X-K1  is  joined  by  a  path  of  length  <  2  to  at  least  one 

node  of  K1  where  K1  =  (X1 ,X2, . . .X^} . 


The  events 


A(J)  =  (Xj  is  joined  to  Ki  by  an  edge} 


KK 

f>5>: 

•\Vv 


■  d  i. 

V 


Vv- 

vv 

*  *«•  i 


•  vv 

_ j 


w  \. , 


m 


m 


m 


£v>*1 

yV’i 

•  V  V 


>  A  . 


V^.V 

LJ& mid 


~7~? 


V.'  v 


>  *  •  J 

V.VV 


■V-*' « 


X*\y 
J W 

.vv 

.  -*  V  j 


"  „*  "  J . , 

“>>>', 

v-y* 


B(J)  =  { Xj  is  joined  to  K1  via  a  node  *  Xj,  Xt  k  } 


are  independent  for  fixed  J  because  they  have  no  edges  in  common. 


Pr(A(j))  =  1  -  (1-p)k  =  p0,  say 
Pr(B( j) )  =  1  -  (1-p0p)n~k_1. 

Hence,  if  N  is  the  number  of  nodes  not  within  distance  2  of  K|, 


Pr(N  >  0)  <  (n-k)  (1-p)k  (l-pop)0-1'-1 

<  (n-k)  (1-p0p)n_1 

<  ne-(n-1)pOp 

If  kp  >  1  then  p0  >  1-e_1  and  so 

/o 

Pr(N  >  0)  <  n_“  using  p  >  wlogn/n. 

2  2 

If  kp  <  1  then  ( 1  — p ) k  <  1  -  kp  +  gP  and  hence  pQ  >  ip  and  then 
Pr  (N  >  0)  <  n_ul  using  kp^  >  w  logn/n. 


This  proves  (20)  and  therefore  (17)  and  the  lemma. 


Lemma  5  Consider  2  <  k  <  log^n.  Assume  p  >  M  and  kp^  > 

u  +  ®.  Then 


z^p  =  max(n-k,  2n-nkp( 1+o( 1 ) ) )  almost  surely. 


then 


□ 


31  C 


■juuiMwr 
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Proof:  Given  a  node  Xif  let  N1  ( i )  =  {Xj :  djj  =  1}  and  N2(i)  =  {Xj :  d±  j  =  2}. 
First  we  give  probabilistic  estimates  of  | N-j ( i )  |  and  |N2(i)|.  We  will 

show 


(21)  min  | ( i ) |  =  ( 1 -o( 1 ) )np  almost  surely, 

i 

(22)  max  |N^(i)|  =  (1+o(1))np  almost  surely,  and 

i 

(23)  min  | ( i ) |  >  min(  £,  (1-o(1))nq)  almost  surely. 

i 


Note  that  |N-|(i)|  is  distributed  as  B(n-1,p).  So,  by  Lemma  1, 


Pr(  min 
i 


Pr(  max 
i 


2 

|Nv(i)|  <  (1-e)(n-1)p)  <  n  e“*  E  (n"1)p 

-  4  e2(n-1)p 

|N1 ( i> |  >  (1+e)(n-1)p)  <  n  e  6 


Putting  e  =  3(logn/(n-1)p)^  yields  (21)  and  (22). 

Now  consider  |N2(i)|.  We  will  assume  p  -►  0  (otherwise  N-|(i)  is  a 

dominating  set  by  Theorem  4(a),  and  (23)  follows).  Conditional  on  [ N -j ( i )  | , 

the  quantity  |N2(i)|  is  distributed  as  B(n?,pP),  where  nP  =  n  -  I N ( i )  I  -  1 

1^(01 

and  pp  =  1  -  (1-p)  .  By  Lemma  1, 


,  2 

-i  e  n_p_ 

Pr(  min  |N2(i)l  <  (1-e)n2p2)  <  n  e 


Set  e  =  3(logn/n2p2)^.  We  have  to  show  e  <  1.  Note  that  n2  =  (1-o(1))n  and 

p.,  .  1  -  (i.p,(1«(1»»P  >  ,.e-O*0(1))np2  aln,ost  3urely. 


If  np  >  d  >  0  where  d  is  fixed,  then 


2 

H 


log  n 


( 1+o( 1 ) )n( 1-e~5) 


-  0. 


If  np^  =  o(1),  then 


2  2 
e  .  logn  1 _  (  log  n  ■>  . 

4  22"  logn  1  np  ^  +  ‘ 

n  p  6  K 


So  we  have  just  shown  that,  almost  surely, 


min  | N2 ( i ) |  >  ( 1-o( 1 ) )n2p2. 
i 

Next  we  will  use  the  fact  that  kp2  >  ^  to  show  n2p2  >  ^  almost  surely. 

2 

■If  np  >6,  0<d<1  fixed,  then  almost  surely 

*"  5  n 

n2^2  “  ( 1+o( 1 ) )n( 1-e  )  >  ^  for  k  >  2  and  6  close  enough  to  T. 

If  np2  <  d  <  1,  then  1  -  e  ^+0^^nP  >  np2(  1  -  ).  So 

n2p2  >  ( 1+o( 1 ) )n2p2(  1  -  |)  >  ( 1+o{1))  ^  (1  -  |)  >  ^  almost  surely. 

This  complete  the  proof  of  (23). 

Now  we  are  ready  to  get  a  probabilistic  estimate  of  z^p.  First  we  obtain 
an  upper  bound  by  considering  the  solution 


(24) 


for  J  =  1,...,n  an^.  y^j  defined  in  Lemma  2. 


T777flj 


^  •.  -.TX’X'X'X  -.'%‘r 


Let  &  =  min  |  N  ^  ( i )  |  be  the  minimum  degree  of  Gn(p).  Note  that,  if 

6  2t  -  1,  then  zLp  =  n-k.  For,  using  the  solution  (24),  we  have 

di  =  £  dijyij  =  1  -  —  for  i=1, ,n.  On  the  other  hand,  if  5  <  ^  -  1, 

then  di  -  n  6  +2  n^  ^  ”  1  "  6)-  ^ij  only  takes  positive  values  for  points 

Xj  at  distance  one  or  two  of  since,  by  (23),  the  number  of  points  at 

distance  2  is  at  least  min(  £,  (1-o(1))nq)  which  is  more  than  the 
n  n 

r  -  1  -  6  points  needed.)  Therefore  z,  D  <  n  \  d.  <  2n-k6,  almost  surely. 

L  i=1  1 

To  obtain  a  probabilistic  lower  bound  for  zLp  we  consider  the  dual  bound 
given  by  Lemma  3.  We  put  ui  =  2  -  ^  for  i=1,...,n  and  let  A  denote  the 
maximum  degree  of  Gn(p).  Then 


zLp  >  n(2  -  ^)  -  kA(1  -  j^)  =  2n  -  (1+o(1))nkp  almost  surely. 


This  completes  the  proof  of  Lemma  5. 


Proof  of  Theorem  4(b) 

It  follows  from  Lemmas  4  and  5  that 


ZIP  ~  ZLP  .  (Uqk)  -  max(  1 ,2-kp) 


almost  surely 


-  ( 1-kp)* 

k  . 
q  +  1 


Setting  a  =  (1-p)*1^  and  kp  =  a,  we  get 


ZIP  ~  ZLP  .  1  -  (1-a)*aa 


almost  surely. 


IP  1  +  a 

It  is  easy  to  check  that  the  maximum  of  this  function  is  achieved  when  p  -►  0 


v'V.V 


yy'A 


%  \  * 

_ j 

•“."j 


v  j 


r  4 


w; 


,V*/ 

«_v.r V 
Vv\ 

C..S 


.'•VO 


and  a  =  1 .  Then  its  value  is 


1+e  * 


An  interesting  range  of  parameters  which  is  not  considered  in  Theorem  4 
is  the  case  2  <  k  <  logfan  and  p  >  — >  kp2  where  u>  *  ».  In  this 
range,  the  expressions  for  Zjp  and  zLp  are  more  complicated  than  those  found 


in  Lemmas  4  and  5.  However  we  conjecture  that 


ZIP  "  ZLP 


♦  0  almost  surely. 


In  the  range  covered  by  Theorem  4,  it  is  easy  to  identify  conditions 


under  which  the  ratio 


example,  consider 


ZIP  "  ZLP 


is  almost  surely  bounded  away  from  0.  For 


(25)  e  <  kp  <  1/e  ,  k  >  2  and 

1  /? 

(26)  (w  logn/n)  <  p  <  1-e 

where  w  •*  «  and  0  <  e  <  1  is  fixed. 

2 


Then  klogb  =  kp(1+^  +  ^+  ...)<  — ^  <  —  .  So  k  <  logbn  for  n 

^  e 

large  enough  and,  by  Theorem  4(b),  there  is  a  fixed  value  f(e)  >0  such  that 


ZIP  "  ZLP 


>  f(e) 


almost  surely. 


In  addition,  we  can  show  that,  under  these  conditions,  a  branch  and  bound 
algorithm  based  on  the  LP  bound  z^p  almost  surely  requires  close  to  complete 
enumeration. 


Theorem  5  Assume  (25)  and  (26).  A  branch  and  bound  procedure  that  branches 
by  fixing  a  variable  Xj  to  0  or  1  at  each  node  of  the  search  tree  which  is  not 
pruned,  and  uses  the  LP  bound  to  prune  the  search  tree,  will  almost  surely 
expand  at  least  n(  1_°( ) ) )(k-2)  (The  number  of  feasible  solutions  of 
the  k-median  problem  is  (£)  =  n^-0^^. 


Proof :  We  first  note  that,  under  the  above  assumptions,  e  <  klog  b  < 

t 

and  therefore 


In  addition,  the  assumptions  of  Lemma  4  hold  and  k  =  o(n  )  so  that 

(29)  Zjp  >  (1-o(1))  n(1+q  )  almost  surely. 

Let  ZLp(JgiJi)  be  the  LP  value  of  the  subproblem  where  JQ  =  {J:  Xj  is 
fixed  to  0}  and  J1  =  {j:  Xj  is  fixed  to  1). 

Let  a<1  and  B>0  be  fixed.  We  prove  the  theorem  by  showing  that,  for  B 
chosen  small  enough,  the  following  property  holds  almost  surely. 

(30)  For  any  £  d,2,...,n}  such  that  Jg  n  =  0,  |Jq|  <  Tsnl  and 

| J-l  |  ^  fakl, 

(31)  ZLP^0’^1^  ^  ZIP" 

This  implies  that  the  algorithm  must  explore  at  least 

Deni  +  r<*kl ■>  (Bn-,  ak  (1-o(1))ak  . 

(32)  (  fakl  )>(^)  =  n  nodes- 

To  verify  (32),  imagine  that  setting  Xj  =  0  means  branching  to  the  left 
and  setting  Xj  =  1  means  branching  to  the  right.  (30)  -  (3D  imply  that  any 
tree  contains  all  possible  paths  which  make  fakl  right  branches  and  Tanl 
left  branches.  The  number  of  such  paths  is  precisely  the  left  hand  side  of 


We  now  turn  to  the  proof  of  (3D.  As  increasing  JQ  or  J1  only  serves  to 
increase  z^p  we  can  restrict  our  attention  to  |Jq|  =  fBnl  and  IJ^  =  fakl. 

Using  Lemma  1  we  can  easily  prove  that  the  following  holds  almost  surely 
for  Gn(p): 

(33)  J  c  (1,2, ... ,n>  and  |J|  =  fakl  implies 

|N(J)|  >  (1-o(1))n(1-qak)  (see  (18)) 

Furthermore,  it  is  easy  to  see  that 

(34)  diam  (Gn(p))  =  2  almost  surely, 
where  diam  refers  to  the  diameter  of  Gn(p). 

Indeed  Pr(there  exists  i,J  e  {1,2,...,n}  such  that  i , J  are  not  joined  by 
a  path  of  length  2) 


BBSS 


0. 


(5) 


XJ  = 


<  n^^108  ^n-2)/n  .  0. 

Thus  (34)  is  proved  (Pr[diam(Gn(p) )  =  1]  =  p  +  0)  To  obtain  an 
upper  bound  on  zLp(J0,Ji)  let 

0  if  jcJ0 

if  JeJ1 
if  jf(J0uJ1 

where  y  =  (k-rakl)/(n-rsnl-fak1). 

The  values  for  y ^ j  are  then  chosen  as  follows: 

i  £  J,  :  yu  =  1  and  y.,  =  0  J  *  i 

i  e  N( J1 )  :  yit  =  1  and  y i j  =  0  j  *  t 

where  t  is  a  node  of  n  N(i). 
i  £  J1  u  N(JJ:  the  values  are  as  defined  in  Lemma  2. 

With  this  solution  we  find,  using  (34)  that 


dt  =  0 
=  1 


if  i  e  J, 


if  i  e  N(J1) 


<  y(5-Si)  +  2(1-y(6-si))  if  i  i  JjuNtJ.j) 
where  s^  *  |M(i)  n  J  |,  j  is  the  minimum  node  degree  and  A  is  the  maximum 
node  degree  in  Gn(p). 

To  compute  an  upper  bound  on  z^p,  we  will  distinguish  between  the  cases 
y6  <  1  and  y6  >  1 . 

First  assume  that  y5  >  1 .  We  use  the  bound 
^  <  y(S-si)  +  2(  1  -y(  6-s^) )  <  1+ys^ 


z  <  |N(J  )|  ♦  l  (HySj) 
i^J1uN(J1)  1 

<  |N(J1)  |  ♦  n-|N(J1)|  +  yA|Jo| 


i 

r.  . 
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=  n  .  „  „(n) 


Since  kp  is  bounded  above  by  a  constant  as  a  consequence  of  (25),  we  simply 
choose  S  small  enough  to  get  our  bound  on  zLp.  Then  (31)  follows  from  (28) 
and  (29). 

Now  assume  that  <  1 .  We  use  the  bound 


y(«-Si)  + 

2(1 

-Y(«-S.)) 

=  2-  y6  +  YS^ 

|N(J. )  |  ♦ 

I  (2 

-  y<S  +  YS, ) 

UJ 

1uN(J1) 

1 

|N( J1 ) |  + 

(2 

-  Y^)(n  - 

|N(J,)|)  +  ya|  J, 

(2-y«)n  - 

(1- 

■Y«)  |N<J1)| 

+  yA 1 JQ 1 

r .  f  ak 1 

M  ♦  q 

(1 

k-Taki  ",  8(k-rakl ) 
1-8  P'  1-8  P 

where  the  last  inequality  follows  from  the  relations 
|N( J, )  |  >  (1-o(1))n(1.-qrak1) 

y6  =  (1+o(1))  -7- - —  p 


1-8 


=  ( 1+o( 1 ) )np. 


Therefore 

ZIP  ~  ZLP  ”  [qr0tkl ( ( 1-p)m-  1-*-mp)  -  y^(  1-qfakl )  ]n  +  o(n) 
where  m  =  k-Takl.  Note  that  y$  -  <  1  implies  mp  <  1 . 


Next  we  show  that  Sm  =  (1-p)  -  1  +  mp  is  bounded  below  by  a  positive 

constant.  This  will  imply  that  Zjp  -  z^p  >  0  by  choosing  8  small  enough. 

2 

We  assume  that  a  is  chosen  so  that  a  <  1  -  £. 

This  implies  m  >  2.  Now 
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=  Vi  ♦  po-d-p)™"1) 


*  Vi  +  P(1-e‘(m'1)p) 


m-1 


>S2f  p  l  (1-e"(i‘1)p) 


i=3 


>  p2  +  p(m-|_m/2j  )  (1-e(Lra/2i  _1)p). 

If  k  is  fixed,  then  p  is  bounded  below  by  a  constant  as  a  consequence  of 
(25).  Therefore  S  is  bounded  below  by  a  constant. 

If  k  ♦  «,  then  m  -  (1-a)k.  Thus  mp  and  hence  Sis  bounded  below  by  a 
constant  using  (25). 

This  completes  the  proof  of  (31).  Note  that  (32)  and  the  condition 
a  <  1-r  imply  the  bound  announced  in  the  statement  of  the 


theorem. 


In  [4],  a  different  graphical  model  is  associated  with  the  variation  of 

the  k-median  problem  known  as  the  k-plant  location  problem.  The  k-plant 

location  problem  is  defined  using  two  sets  X  =  [Xj,  ...,  Xn)  and-  Y  =  {Yj,  . .., 

Ym}.  The  quantity  dj*  is  defined  for  each  1<i<m  and  1<J<n.  The  problem 

m 

consists  of  finding  a  set  ScX,  |S|  =  k,  that  minimizes  £  min  d. .  . 

i=1  JeS  1J 


A  k-plant  location  problem  arises  from  a  graph  G  by  defining  X  as  its 
node  set,  Y  as  its  edge  set  and  dy  =  0  if  Xj  is  incident  with  Yif  1 
otherwise.  (The  problem  is  to  find  k  nodes  that  cover  the  maximum  number  of 
edges  of  G.)  It  is  shown  in  [4]  that 

ZIP  =  ZLP  almost  surely 

when  G  =  Gn(p)  is  a  random  graph  with  0<e<p<1-e,  e  fixed,  and 
k  <  na,  a<1/6  fixed. 


A  Tree  Model 


This  section  is  concerned  with  the  following  tree  based  model:  we  are 
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given  a  random  tree  Tn  with  node  set  X  =  {X1t  Xn}  where  each  of  the  n11-2 
different  trees  is  equally  likely  to  occur.  The  distance  d^j  is  the  number  of 
edges  in  the  unique  path  from  X^  to  Xj  in  Tn.  This  section  contains  a 
probabilistic  result  (Theorem  7)  and  a  deterministic  one  (Theorem  6). 

Kolen[13]  proved  that  Zjp  =  z^p  for  every  SPLP  defined  on  a  tree.  For 
the  k-median  problem,  this  equality  does  not  always  hold  as  shown  in  Theorem 
6.  In  fact  we  show  in  Theorem  7  that,  for  random  trees  on  n  nodes,  the  number 
of  values  of  k  such  that  t  Zjp  is  almost  surely  at  least  cn,  for  some 
constant  c  >  0. 


Theorem  6 

(a)  For  k  =  1  or  k  >  Zjp  =  z^p  every  tree  on  n  nodes. 

(b)  For  2<k<  ,  and  n  *  8,  there  is  a  tree  on  n  nodes  such  that  zi 


(c)  There  is  an  infinite  family  of  trees  such  that 


ZIP  "  ZLP 


k-1 
2k  ’ 


It  would  be  interesting  to  perform  a  worst-case  analysis  of  the  k-median 

1/.  1 

problem  and  its  LP  relaxation  on  trees.  We  conjecture  that  the  ratio 
found  in  (c)  is  the  worst-case  bound. 


Proof  of  Theorem  6:  For  the  1-median  problem,  it  is  well-known  that  zip  =  zLp 
for  every  choice  of  d^j,  1  <  i,j  <  n.  For  example,  this  result  appears  in 
Mukendi  [18], 

When  k  >  [^j,  zjp  =  z^p  =  n  -  k  follows  from  the  fact  that  every  tree 
on  n  nodes  has  a  dominating  set  of  cardinality  at  most  [^j.  (A  tree  is 
bipartite  and  a  color  class  dominates  it). 

To  complete  the  proof  of  Theorem  6(a),  it  suffices  to  consider  the  case 
where  n  is  even  and  k  =  -  1  .  The  only  trees  which  do  not  have  a 


dominating  set  of  size  k  are  constructed  inductively  from  a  path  with  4  nodes 
by  adding  paths  =  (v^,  Vg1,  v^*)  where  v^  is  one  of  the  nonleaf  nodes  of 
the  current  tree  and  V2*,  Vj*  are  two  new  nodes.  (See  Figure  2(a)).  From  the 


construction  Zjp  =  n-k+1  =  2 . 


Using  the  dual  values  Uj  =  2  if  Xj  is  a 


leaf,  1  if  not,  Lemma  3  yields  zLp  >  ^  2.  Therefore  zIp  =  zLp. 

Xx 


Pk 


r 


A 


?  i  (b) 
if>ti 

Figure  2 


To  prove  Theorem  6(b)  when  n  is  odd,  consider  the  tree  of  Figure  2(b). 
Let  p  =  ~2~  •  An  optimal  solution  of  the  k-median  problem  is  to  take  S  = 
(Xi»  X2,  X4,  X6,  . ..,  X2(ic_i)}.  Then  Zjp  =  3p  -  2(k-1).  We  get  a  feasible 


n-k  k-1 

solution  of  the  LP  relaxation  by  setting  x,  =  7  and  x_.  =  — 7  for  i  =  1 

1  p— l  2i  p-1 


p.  This  yields 


Therefore  z 


sLp  <  (3p  -  2pk  -  p  +  k  -  1)  /  (p-1). 

k-1 


>  0. 


IP  -  ZLP  -  ^TT 

To  prove  Theorem  6(b)  when  n  is  even,  n  *  8,  we  first  consider  the  case 


k  >  3.  Add  a  node  X2p+2  adjacent  to  X2p  to  the  tree  of  Figure  2(b).  Then 
it  is  optimum  to  choose  X2p  in  S  and  we  can  also  choose  X2p  =  1  in  the  LP 
solution.  Removing  X2p,  ^2p+i  and  X2p+2»  we  are  back  to  the  case  where  n  is 
odd  and  k  >  2.  Now  consider  the  case  n  >  10  even  and  k  =  2.  Add  three 
nodes  to  the  graph  of  Figure  2(b),  namely  X2p+1+i  adjacent  to  X2i  for  i  = 
1,2,3.  Then  zIp  =  3p+3  but  there  is  a  better  LP  solution,  namely  x,  =  1  and 
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x2  =  x/j  s  xg  =  1/3.  This  yields  zLp  =  3p  +  1. 

Finally,  to  prove  Theorem  6(c),  consider  the  tree  of  Figure  2(c).  The 
node  Xi  has  degree  k+1  in  the  tree.  Each  branch  incident  with  contains  b 
nonleaf  nodes  and  !  leaf  nodes  where  b  +  a  +  ■  and  a  grows  much  faster 
than  b.  We  denote  by  X2,  ...»  Xk+2  the  (k+1)  nodes  of  the  tree  which  are 
incident  with  leaves.  Then,  an  optimal  solution  of  the  k-median  problem  is 
{ x 1 ,  x2,  X3,  ...,  xk}. 


z j  p  =  (k-1 )a  +  2(b+1)a  +  0(kb  ) 


where  the  last  term  accounts  for  all  the  nonleaf  nodes.  Ignoring  the  lower 
order  terms , 

Zjp  -  2b! . 

1  k-1 

To  get  an  optimal  LP  solution,  set  x1  =  -  and  Xj  =  for  j  =  2, 

. . . ,  k+2 . 

zLp  =  (k+1)!  x  ^  +  (k+1 )a  x  (b+1)-  +  0(kb2),  i.e. 
k+1  u. 

ZLP  "  k  bl* 


Therefore 


z  -z  2--— 

IP  ZLP  _ _ k_  k-1 

Zrn  ”  2  2k 


In  the  next  theorem  we  consider  all  the  k-median  problems  defined  on  a 
tree,  namely  all  1  <  k  <  n  where  n  is  the  number  of  nodes  in  the  tree. 

Theorem  7  Let  Tn  be  a  random  tree.  There  exists  a  positive  constant  c  such 
that,  almost  surely,  zIp  *  zLp  for  at  least  cn  different  values  of  k. 


Proof  =  Consider  a  random  tree  Tn  =  (Vn,  En)  and  a  fixed  tree  T  =  (V,E). 

Let  v  e  V.  We  say  that  Tn  contains  a  copy  of  T  suspended  at  v  if  there 

exists  V'  c  V  such  that 
-  n 

(35)  Tn(V')  is  isomorphic  to  T  under  a  mapping  $  :  V  -►  V'. 

(36)  there  is  a  unique  edge  of  Tn  with  exactly  one  end,  say  v',  in  V' 

and,  in  addition,  v'  =  $(v). 

Let  m  =  | V |  and  a  =  the  number  of  automorphisms  of  T.  Then  m!/a  is  the 
number  of  distinct  labeled  graphs  on  m  nodes  which  are  isomorphic  copies  of 
T.  We  first  prove  that  almost  surely  Tn  contains  at  least  ( 1-o( 1 ) )(n/ema) 
copies  of  T  suspended  at  v. 

For  each  V'  c  Vn,  |V'|=m,  let 

.  r  1  if  (35)  and  (36)  hold, 

^  ;  “  *  0  otherwise. 


We  note  that,  if  V’nV"  *  0,  then  6(V')«(V")  =  0. 


Let  N  =  l  6(V') 

V'cVn 

|V|=m 

=  the  number  of  copies  of  T  suspended  at  v  contained  in  Tn. 

Now  for  a  fixed  copy  of  T  on  a  set  of  m  nodes,  there  are  (n-m)n_m_1  ways  of 


choosing  a  tree  on  the  remaining  n-m  nodes  and  then  joining  it  to  v.  Thus 
E(N)  =  ("  )  (m!/a)  (n-m)0-1""1/  nn~2 


.  m 

-  n/e  a. 


Using  the  Markov  inequality  Pr(Y  >  a)  < 


with  Y 


(N-E(N) ;  , 


E(Y)  =  ujj  and  a  =  X  we  get  the  Pearson  extension  of  the  Chebychev 

inequality. 


^  .*•  ./V  j.* •  “♦  ■ '»  \  v  '•  »  i  *•  "  \  '  '  >  * .  *  *  ,\\ '  \  *  A  V  ^  v 


A  .v 


i _ _ 

rj»  "J* 

.  V  \ 

•  ^ 

»  . 


•  v.v,, 


:.V_VNJ 

r  A 


• 

•/  \*  V 


W  V 

,N> 

•  v,*v 


■A- V 

■  v  % \*o 


»v .  • 

i.'*  *  *  ' 

*.v. 

V  *  *  •  »*, 

*  •• *  *  *  \ 
V*  •  «  «  \ 

! 

•  v> 


•  —  m  m 


vW 


32 


Pr{  |N-E(N)  |  >  XWll4}  <  -n 


In  terms  of  factorial  moments  y^  is  given  by 

u4  =  w[4]  "  4»*[l]  »[31  +  6yC nyC2]  •  3vJ[1] 
+  6v[3]  -  12w[ i ] u[2]  +  6»[1] 

+  7lJ[2]  -  4^1] 

-  MtU 

where  urfi  is  the  ith  factorial  moment. 


n!  ,m!,i  (n-im)n-im“2  ,  .  ,i 

»ti]  =  ,  ,,i,  .  ,,  <r>  (n-in) 

(m!)  (n-im)!  n 


We  find 


w[2]  = 


2  -2m/n  +  0(.!,2) 

y[1]  e 


u[3]  =  wti]  e 


-6m/n 


♦<>6) 


4  -12n/n  ♦  0(n2) 

V[H  e 


In  the  expression  for  yjj  above,  the  first  row  is  the  powers  of  n  .  When  we 

evaluate  this  row  we  find  that  terms  in  1  and  1/n  of  the  exponentials 

4  1  2 

disappear  simultaneously,  leaving  a  term  in  y^j  0(-2)*  i.e.  0(n  ). 

•2  ^ 

Similarly  in  the  next  row  (powers  of  n-3)  the  terms  in  1  of  the  exponentials 

3  1  2 

disappear  simultaneously  leaving  y^j  0(— ) ,  i.e.  0(n  ).  The  last  two  rows 


are  0(n2) , 


Thus  yn  =  0(nc)  and  setting  X  =  n^+4e  gives 


Now  we  consider  the  fixed  tree  T  given  in  Figure  3. 


Figure  3. 

Let  Sk  be  an  optimal  k-median  solution  in  Tn.  We  will  let  k  increase  from  1 
to  n.  Consider  any  copy  of  T  suspended  at  v  contained  in  Tn,  say  (V',E’). 
Note  that,  if  |V'nSk|  >  1,  then  v  e  Sk  This  .implies  that  there  exists  a 
K  such  that  for  k  >  K,  |V'nSk|  is  a  nondecreasing  function  of  k  which  goes 
from  1  to  15  (  =  m). 

Let  z t p( V * )  =  £  min  d..  .  When  |V'nS.|  =3,  an  optimal  set 

ieV'  JeSk  1J 

V'  n  Sk  is  {v,  X.|,  X2}  with  z j p( V * )  =  14.  However,  consider  the  fractional 
solution  x1  =  x2  =  x^  =  x^  =  j,  Xj  =  1  for  the  variable  associated  with  node 
v,  and  Xj  =  0  for  the  other  nodes  of  V'.  Let  y^  be  defined  as  in  Lemma  2  and 
z,p(V')  =  T  a..y.,  .  The  above  fractional  solution  yields  zrp(V')  =  13.5 

L,r  j_ey 

and  therefore,  when  I V *  n  Sk|  =  3,  Zjp  >  ZLp. 

Since  Tn  contains  almost  surely  at  least  ( 1-o( 1 ) )n/m!a  copies  of  T 
suspended  at  v,  there  are  at  least  as  many  values  of  k  for  which  zTP  >  zrp. 


5.  The  uniform  cost  model. 


In  this  section,  we  look  briefly  at  the  model  where  the  djj's  are  drawn 
independently  from  the  [0,1]  uniform  distribution,  1<i,j<n. 

Here  we  do  not  assume  d^  =  0,  d^j  =  dj^  or  d^j  <  d.^+d^j,  as  we  did  in 
the  other  models.  The  quantity  d^j  is  interpreted  as  the  cost  of  assigning 


The  main  result  of  this  section  states  that,  when  k  >  n(e-1)/e,  then 


2IP  =  ZLP  aimost  surely,  and  when  k  =  o(n/logn),  then 


ZIP"ZLP  .  k-1 


almost 


surely.  The  analysis  is  made  possible  by  the  fact  that,  in  those  ranges,  the 
k-median  problem  is  almost  surely  trivial  to  solve  exactly  or  approximately. 
(When  k  >  n(e-1)/e  there  is  an  obvious  optimal  solution,  and  when 
k  =  o(nlogn)  every  solution  is  close  to  optimum.) 


Theorem'  8 


(a)  Suppose  k  =  o(n/logn).  Then 

Zip  ~  n/(k+1) 

zLp  ~  n/2k 

(b)  Suppose  k  >  ( 1+o( 1 ) )n(e-1 )/e. 


'IP  =  ZLP 


almost  surely 
almost  surely. 


almost  surely. 
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Proof  =  Let  S  be  a  fixed  set  of  size  k.  If  we  take  Xj  =  1  for  jeS  as  our 

solution  to  the  integer  program,  then  the  d^.  =  min  d..  are  independently 

jeS 

distributed  as  the  minimum  of  k  uniform  [0,1]  random  variables,  i.e. 


Pr(d^>a)  =  a * 


for  0<a<1, 

for  i  =  1,  ...,  n.  We  first  consider 


and  hence  ECd^  =  1/(k+1)  for  i  =  1,  ...,  n.  V 

k  =  O(n^).  Applying  Lemma  1  to  D  =  d^  +  ...  +  dn,  we  have 


n  .'•y  j 
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I 


smallest  out  of  n  independent  [0,1]  uniform  random  variables.  Thus 

Tn/kl 

ECdj)  <  (k/n)  l  t/(n+1 )  =  (1+o(1))/2k. 

t=1 

Applying  Lemma  1  in  the  usual  way  shows  that 


zLp  <  ( 1+o( 1 ) )n/2k 


almost  surely. 

On  the  other  hand,  consider  the  dual  solution  u^  =  1/k  for  i 
n.  Then,  by  Lemma  3, 


=  1, 


zr„  >  £  u.  -  k  max  (  £  (u.-d..)+). 


LP 


i=1 


j=1,...n  i=1 


As 


in  Theorem  2,  for  fixed  j,  we  consider  random  variables  =  (ui-dij)+. 


1  1 

Setting  u.  =  we  find  E(U.)  =  — r.  Rescaling  the  IL  to  [0,1]  and 

1  2k* 


applying  Lemma  1 


2  „ 
e  n 


Pr(  ^  k  U.  >  ( 1+e)  S-)  <  e 


3  2k 


Hence,  if  n/k  =  0^  logn  where  8  -►  then  taking  e  =  1/9  yields 


zLp  -  £  "  k<1+e)  “S 

Lr  k  2k^ 


almost  surely 


=  d-o(D) 


•• 

> 

r- 

* 

k 

* 

k' 

V 

1; 

s 

J 

At 

At 

J 

'4 

'J 

I 


This  .-ompletes  the  proof  of  Theorem  8(a). 

Mow  consider  the  case  where  k  is  sufficiently  large  so  that  each  point 

can  be  assigned  to  the  cheapest  point  Xj(i)>  defined  by  djj(i)  = 

min  d,  .  .  Then  clearly  z^p  =  z^p. 

J  =  i . n  J 

For  J  =  1,...,n,  let  Nj  be  the  number  of  points  X^  assigned  to  Xj 
according  to  the  above  scheme.  Nj  is  asymptatically  distributed  according  to 


I 


1/e.  Therefore 


a  Poisson  process  with  mean  1;  in  particular  Pr(Nj=0)  - 
E(|{j  *  Nj=0}|)-  n/e.  To  show  Z  =  | {j :  =  0} |  <  (1+o(1))^  almost  surely 

we  use  the  generalized  Markov  inequality  Pr(Z  >  a)  <  ^ —  for  any  non¬ 
negative  monotone  increasing  <t>.  We  let  k  =  fn1^^!,  a  =  (1+e)^  where 

e  =  n-^  and  4>(a)  =  max  (0,a(a-1) _ (a-k+1 ) }/k!  and  note  that  4>(Z)  =  the 

number  of  k-sets  S  for  which  Nj  =  0,  jeS.  This  gives 


<k>  ^ 


=  0  ((Ue)'k). 


This  completes  the  proof  of  Theorem  8.  □ 

As  for  the  Euclidean  and  graphical  models,  we  can  show 
Theorem  9  Suppose  k  =  b((n/logn)  /c)  and  ■  k  *  ®.  Then  an  LP  based  branch 

and  bound  algorithm  almost  surely  explores  at  least  nodes  of  the 

search  tree. 

Proof  =  Let  zLP^o»^l)  be  the  LP  value  of  the  subproblem  where 
JQ  =  (j  ;  Xj  is  fixed  to  0}  and  =  {j  :  Xj  is  fixed  to  1}.  Assume  that 
a  <  1  is  close  to  1 ,  that  b  is  large  and  that  a  and  8  have  been  chosen  so  that 
ok  and  Bk  are  integer.  To  prove  the  theorem,  it  suffices  to  show  that,  almost 
surely, 

(37)  for  any  JQ,  c  {1,2,...,n}  such  that  ^0n^1  = 

|Jl|  <  ok  and  |JQ|  <  n-(B-a)k,  we  have  zLp(J0,J.|)  <  zip. 


As  increasing  Jq  or  only  serves  to  increase  Zrp,  we  can  restrict  our 
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attention  to  ~|  |  =  ak  and  -  |JQ|  =  n-(6-o)k.  Let  L  =  ok,  K  =  6k  ar 

Y  =  j  =  ^  .  To  obtain  an  upper  bound  on  zLp(J0,J-|),  let 


Xj  =  I  1 


if  JeJ 


1  '  if  JeJ1 

Y  if  J^J0uJr 


Consider  a  fixed  i  and  suppose  c.  .  <  o.  .  <  ...  <c.  .  .  Let  t  =  mints  : 

J1  '  J2  n  1Jn 

js  e  J^.  Let  y ^ j  be  given  by  Lemma  2  and  d^  =  £  The  exPected 

value  of  dif  conditional  on  knowing  the  value  of  t,  is 


Exp(d.|t)  =  Y  X  kTT  ♦  d-Yt)^ 


f(t-1)t 
2(K+1 ) 


if  t  <  lY  J, 


V.v;/ 

v  Vv 

;S?i5s 

"vjvN 

»/> 
.V.V*v 
•  •>  V%J 


Exp(d^ 1 1)  =  Y  +  y(y  My  M  ^ \+]  +  1 


•1-tY~1l)(tY~1. 
2 ( K+ 1 ) 


if  t  >  Ly  J 


Pr(t)  = 


(K-t] 
lL-1 1 


Using  (38)  -  (40),  we  get  the  expected  value  of  d^. 


1  l  Y  j  t/  *-  I  Y  J  V  L 

Exp(d.)  =  - -  l  t(f"^j  -  - 2— r  l  (t-l)t(f'h 

1  (K+1)([)  t=1  L_1  2(K+1)(“)  t=1  L_1 


m 

n 


'0$ 

J»  m  •  a  ■ 


w.:- 

■N.-'vS 


-.M 

,-V  . 

v's-V 

"  V  s*  < 


L ' 


t>lY  J 


Now,  as  may  be  inductively  verified, 


fc\)  *  (L.,)  ♦•••♦  (L-,> 


(?) 


rK-1 \  orK-2>  .rK-A% 

Il-i*  +  2k-iJ  +  +  Ak-iJ 


/K+K  .  f  K-Ai  /K+1-A% 
*  lL+lI  '  A(  L  J  ’  (  L+1  ) 


1-2(J:?)  ♦  2.3 {[I])  *  ...  ♦  (A-I)A(^)  =  2{lll)  -  A(A*1)(K[A) 


Therefore. 


-  2(A+D(f-;)  - 


Expfd.^ 


—J 1 -  |fK*l]  .  lfK-lT_1  J1  _  rW-lr''h| 

(K.D(^)  Kl*,)  L  L*’  U 

1  r^K+U  .  -1  -1  ,  lUK-l  Y_1  K 

TiTT  l2lu2^  "  lY  j(Ly  j+1H  l  ) 
2(K+1 )  (£)  L 


■1  .  -I.w.  -1 


fK+1l 

^L+1  *  _  J_ 

<K*1>£)  =  L+1 


f  K+1  \ 

Ylr  xpJ 


(*♦!>(£) 


K->  '  K(L+1  )(L+2) 


[K-Ly-1j) 


<  e  ^Y  Hence,  for  o  and  6  fixed,  where  a  is  close  to  1  and  8 


\) 


r<ramniwm  «k  wswrvncswvmawviisii  mnnnnrs^  t»  ri 
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large,  and  for  k  -►  ®, 


-1 


■  Is  -  (nS  *  £  Kk^  ')!('•«*» 


-1 


- -  (» *  ^  *  ir)(K'lLY  'KwliJ) 

So  k  ok 

J+0(l)  (-K-Ly  1J'|(i+o('-l) 

2(1-o)k  *•  L 


1  rl  (1— o)(S_o)  (  1  1-a^i  r,  „^1^ 

=  kl;-— 5 — )1  (i*o(j)) 

So 


El’  *  tj-’)2  *  ^  *  •(£  •*  ’-)]  (’-<)) 


Let  e  =  P~  .  Then 

I  -Ql 


Exp(di)  =  ♦  o(~  e  7-a)]  (l+0(^)) 


1  1-ai 


»i  ( 1  1  1-a 

Now  [-  -  11  /  -  e 

va  J  1-a 


♦  ®  as  o  ♦  1 .  Thus,  by  choosing  a  close  to  1, 


we  get 


Exp(d.)  <  I  (1  -  ^  -  l)2j. 


1  1  /2 

Applying  Lemma  1  with  c  =  -)  ,  we  get 


*■  V ' 

.* 

iL__. 


£>$ 

f.yV' 

///?/ 


,Vv 

v>;. 

»  w*  «% 

C&V 

feS 

trV \ 
&& 

vvS 

£&2 


‘V.-' 

7*v!* 


zlp(jo'ji}  5  £  t1  -  -  0  )(i+°(D) 

nB. 

with  probability  at  least  1  -  e  lo®n  where  B  =  1  -  ^[^-l]2.  We  have 
to  branch  for  all  |J^|  <  ok  and  |Jq|  <  n-(s-a)k  with  probability  at  least 


w  . 


(a+B)k 


and 


n 


1  -  .CO  e  3k  l08"  *  '  slnoe  (»)&)*"-  ana  k2  logn  - 

This  proves  that,  almost  surely,  (37)  holds.  As  a  consequence,  the  number  of 
branches  in  the  search  tree  is  at  least  (n~6k)  = 


6.  Computational  Experience 

The  previous  sections  provide  asymptotic  results  as  n  ♦  In  this 
section,  we  report  our  computational  experience  with  medium-size  k-median 
problems  for  the  four  probabilistic  models  introduced  earlier.  This 
computational  experience  is  based  on  the  solution  of  about  3300  random 
problems  with  n  =  50  points  and  an  additional  950  random  problems  with  n  =  100 


points.  The  description  of  these  problems  is  given  later. 


For  each  problem  we  computed  Zjp  and  z^p.  The  value  of  z^p  was  obtained 
Jay  solving  a  Lagrangian  dual  by  subgradient  optimization  as  explained  in  [3]. 
In  the  process  of  computing  zLP,  this  algorithm  generates  a  feasible  solution 
at  each  subgradient  iteration.  Of  course,  if  it  happens  that  the  value  of  the 
best  feasible  solution  generated  equals  z^p,  the  algorithm  terminates  since, 
then,  Zjp  =  z^p.  For  most  of  the  test  problems  with  no  gap  Zjp  -  z^p,  the 
algorithm  terminated  in  less  than  100  subgradient  iterations,  due  to  the  above 
stopping  criterion.  If,  after  100  subgradient  iterations,  there  was  still  a 
gap  between  the  best  feasible  solution  (an  upper  bound  on  Zjp)  and  the  best 
Lagrangian  relaxation  (a  lower  bound  on  z^p),  we  resorted  to  branch  and  bound 
to  find  Zjp.  When  the  subgradient  algorithm  clearly  converged  to  a  value 
different  from  Zjp,  we  accepted  it  as  showing  that  Zjp  *  z^p.  In  the  cases 
where  the  subgradient  algorithm  converged  to  a  value  close  to  Zjp  we  used  the 
simplex  algorithm  to  compute  zLp.  This  allowed  us  to  settle  cases  where  there 
was  a  very  small  but  positive  gap  zIp  -  zLp. 


if  ixB&smii&s-rrtt;  j;/7f/riT/ra  aMaaiasmoazgaa 
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Among  the  4250  test  problems  that  we  generated  we  found  about  3700  such 
that  Zjp  =  zLp  and  about  550  with  a  gap  zIp  -  zLp.  Now  we  give  a  detailed 
description  of  these  results. 

The  first  set  of  experiments  involves  Euclidean  problems.  We  decided  to 
test  whether  approximating  the  Euclidean  distances  had  an  influence  on  the  gap 
Zip  -  zLp,  since  we  suspected  that  data  accuracy  might  be  partly  responsible 
for  the  discrepancy  between  the  computational  experience  previously  reported 
in  the  literature,  namely  few  test  problems  were  found  to  have  gaps  ([2],  [3], 
[6],  [10],  [11],  [19],  [20],  [23],  [24]),  and  the  results  of  Section  2  stating 
that  asymptatically  most  instances  should  have  small  but  positive  gaps.  To 
our  surprise,  data  accuracy  had  little  influence  except  maybe  for  the 
possibility  that  a  very  coarse  approximation  produces  harder  k-median 
problems.  (These  problems  are  more  combinatorial,  often  have  alternate 
optimal  solutions  and,  in  our  experience,  optimality  was  harder  to  prove). 


We  generated  10  problems,  each  with  50  points  occurring  at  random  in  the  unit 
square.  Then,  for  i  =  1,2, 3, 4  and  5,  we  multiplied  each  point  coordinate  by 
101  and  rounded  it  to  the  closest  integer  value.  The  Euclidean  distances  were 
then  computed  and  rounded  to  the  closest  integer.  The  k-median  problem  and 
its  LP  relaxation  were  solved  for  each  2  <  k  <  10  and  1  <  i  <  5.  For  each 


such  pair  i,k,  Table  1  reports  the  number  of  problems  (out  of  10)  with  a  gap 


Zjp  -  zLp. 


Total 


(out  of  90) 


23 

(out  of  450) 


Table  1  Euclidean  model  with  n  =  50. 
Number  of  instances  with  q  gap. 


The  same  two  problems  were  responsible  for  all  the  gaps.  The  average 

zIP~zLp 

value  of  — - -  over  the  instances  that  had  a  gap  was  approximately  1 .5% 

ZIP 

for  i  =  1,  .4J  for  i  =  2  and  for  i  =  3,4  and  5.  Overall,  the  fraction  of 
instances  with  a  gap  was  about  5/5.  This  is  consistent  with  the  computational 
experience  reported  in  the  literature.  Clearly,  the  asymptotic  behavior 
described  in  Section  2  is  not  felt  for  problems  with  n  =  50  points.  It  would 
be  interesting  to  repeat  the  computational  experiment  for  Euclidean  k-median 
problems  with  about  n=1000  points.  Unfortunately  our  computer  budget  did  not 
allow  to  do  this. 

The  second  set  of  experiments  involves  random  trees.  We  generated  100 
random  trees,  50  of  them  with  n  =  50  nodes  and  the  other  50  with  n=100  nodes, 
using  the  method  described  in  Even  [7].  First  we  assumed  that  all  edge 
lengths  were  equal  to  1  in  the  trees,  and  we  solved  the  k-median  problem  and 
the  LP  relaxation  for  2  <  k  <  11  in  each  tree.  For  each  pair  n,k,  Table  2 


unique  path  Joining  them.  Table  3  reports  these  results. 


(out  of  100)  I 


|  (out  of  900) 


Table  3  Tree  model  with  non-unit  edge  lengths. 

Number  of  instances  with  a  gap. 

We  did  not  find  a  significant  difference  in  difficulty  between  the  two 
tree  models.  Overall,  the  fraction  of  instances  with  a  gap  was  about  4%. 

Our  third  set  of  experiments  involves  random  graphs.  First  we  report  the 
results  when  the  edge  lengths  are  equal  to  1.  Starting  from  a  random  tree  on 
50  nodes,  we  generated  a  sequence  of  graphs,  adding  50  random  edges  at  a  time 
to  the  previous  graph.  Table  4  contains  the  value  of  zLp  and  zip  for  each 
graph  and  2  <  k  <  10.  Only  one  figure  means  that  zIp  =  zLp.  Note  that  when 
ZIP  =  ZLP  =  n"k  for  some  8raPh>  contains  a  dominating  set  and  therefore 
every  subsequent  graph  in  the  sequence  also  does. 


Among  che  instances  where  a  dominating  set  did  not  exist,  about  28%  had  a 

gap. 

Next  we  turn  to  the  graphical  model  with  non-unit  edge  lengths.  We 
started  from  10  random  trees  on  n  =  50  nodes.  We  then  added  random  edges,  50 
at  a  time,  until  the  graphs  contained  849  edges.  The  edge  lengths  were 
computed  using  the  same  scheme  as  earlier.  Namely,  the  nodes  were  assigned 
random  integer  coordinates  in  a  square  of  size  10x10  and  the  length  of  an  edge 
was  the  Euclidean  distance  between  its  two  endpoints,  rounded  to  the  closest 
integer.  The  distance  between  two  nodes  of  the  graph  was  taken  to  be  the 
length  of  the  shortest  path  joining  them  in  the  graph.  Table  5  reports  the 
number  of  instances  with  a  gap  (out  of  10),  as  a  function  of  the  number  of 
edges  in  the  graph  and  k. 
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\k 

number) 

of  edgea 

2 

3 

4 

5 

6 

7 

8 

9 

10 

|  Total 
|  (out  of  90) 

49 

0 

1 

1 

0 

0 

0 

0 

0 

1 

1  3 

99 

1 

1 

1 

2 

3 

1 

0 

1 

1 

1  11 

149 

2 

1 

2 

2 

1 

0 

0 

0 

0 

1  8 

199 

1 

2 

1 

0 

1 

0 

2 

1 

2 

1  10 

249 

1 

2 

2 

1 

1 

0 

1 

3 

1 

I  12 

299 

2 

1 

2 

2 

1 

2 

1 

1 

1 

1  13 

349 

2 

2 

4 

1 

5 

1 

0 

3 

2 

|  20 

399 

1 

3 

2 

0 

2 

1 

0 

1 

1 

1  11 

449 

3 

2 

2 

1 

1 

2 

2 

1 

0 

1  I4 

499 

0 

1 

1 

2 

1 

1 

1 

1 

0 

1  8 

549 

1 

1 

4 

0 

0 

1 

2 

1 

1 

1  11 

599 

1 

1 

0 

2 

2 

2 

2 

0 

2 

1  12 

•  649 

2 

0 

1 

2 

0 

0 

3  . 

1 

1 

1  10 

699 

0 

2 

2 

1 

0 

2 

2 

0 

1 

1  10 

749 

0 

1 

1 

0 

1 

2 

1 

1 

1 

1  8 

799 

1 

1 

0 

1 

1 

1  ' 

0 

2 

3 

1  1° 

849 

0 

0 

2 

0 

2 

1 

0 

2 

3 

1  1° 

Total 

18 

22 

28 

17 

22 

17 

17 

19 

21 

181 

(out  of  170)  (out  of  1530) 


Table  5  Graphical  model  with  non-unit  edge  lengths. 
Number  of  instances  with  a  gap. 


For  this  model,  the  fraction  of  instances  with  a  gap  was  about  12%.  The 


ZIP~ZLP 

average  of  — - -  taken  over  the  instances  with  a  gap  was  less  than  1%. 


Note  that  the  first  line  of  Table  1  corresponds  to  the  case  of  the  graphical 

49x50 

model  where  the  number  of  edges  is  2 —  =  1250  and,  as  such,  could  be  added 
as  a  line  of  Table  5. 

Finally,  the  fourth  set  of  experiments  deals  with  the  uniform  cost 
model.  We  generated  30  problems  with  random  integer  costs.  In  the  first  10 
problems  the  costs  were  in  the  range  [1,10],  in  the  next  10  in  the  range 
[1,100]  and  in  the  last  10  in  the  range  [1,1000].  For  each  problem  the  values 
of  Zjp  and  zLp  were  computed  for  2  <  k  <  10.  For  each  range  and  value  of  k, 
Table  6  contains  the  number  of  instances  with  a  gap  (out  of  10). 


V  1 

range 

2 

3 

4 

5 

6 

7 

8 

9 

10 

|  Total 
|  (out  of  90) 

10  | 

10 

10 

10 

10 

10 

10 

10 

6 

4  . 

|  80 

100  | 

10 

10 

10 

10 

10 

10 

10 

8 

5 

|  83 

1000  | 

10 

10 

10 

10 

10 

10 

10 

10 

6 

|  86 

1 

Total  | 
(out  of  30) 1 

30 

30 

30 

30 

30 

30 

30 

24 

15 

|  249 

1  (out  of  270) 

Table  6  Uniform  cost  model  with  n  =  50. 

Number  of  instances  with  a  gap. 

For  this  model,  the  fraction  of  instances  with  a  gap  was  about  92# 
overall,  100#  for  k  <  8.  This  fits  well  with  the  results  of  Section  5.  The 

ZIP~ZLP 

value  of  the  ratio  — - -  was  much  larger  than  in  the  other  models.  It 

IP 

reached  18#  for  one  of  the  problems  with  costs  taken  in  the  range  [1,1000]  and 
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k  =  3.  Note,  however,  that  this  is  still  far  below  the  asymptotic  value  of 
33t  predicted  by  Theorem  8  when  k  =  3. 


7.  The  Simple  Plant  Location  Problem 


Although  we  proved  our  probabilistic  results  for  the  k-median  problem, 
they  can  also  be  useful  for  the  SPLP.  To  define  an  instance  of  SPLP,  we  need 


fixed 


costs  fj,  j  =  1,...,n,  in  addition  to  the  distances  dy,  1  <  i,  j  <  n. 


For  simplicity,  we  assume  in  this  section  that  the  fixed  costs  f*  are  all 


identical,  s F-  =  F . 

v 


Theorem  10  Consider  the  Euclidean  model  in  the  plane  and  assume  that 
ne~^  <  f  <  n^-G  for  some  fixed  e  >  0.  Then,  for  the  SPLP, 


ZIP~ZLP  . 


.00189255... 


almost  'surely. 


Proof.  In  this  proof,  zIp  and  zLp  denote  the  optimum  values  of  SPLP  and  its 


linear  programming  relaxation  respectively. 


The  solutions  of  the 


corresponding  k-median  problem  (with  same  d^j's)  and  its  relaxation  are 
denoted  by  ZjP(k)  and  zLP(k)  resPectively • 

By  definition  zLp  =  min  (zLP(k)  +  kf)  =  min(z.| ,z2,zg) ,  where 


z1  =  min  (zLp(k)  ♦  kf), 
k<w 


min  (zLp(k)  +  kf),  and 

<<~ — 
wlogn 


min  (z.  r>( k )  +  kf ) . 


fr: 


t: 


P: 


V. 
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where  c  is  a  constant. 

We  have  just  proved  that 


,27  2  2^1/3 

ZLP  ~  '■IP01  n  H  almost  surely. 


Similarly,  zIp  =  min  (zip(k)  +  kf). 

k 


Following  the  proof  of 


Papadimitriou  [22],  we  can  show  that 
-Bn 


(41)  zIp  =  min  (/j«-(  1+o(  1 ) )  +  fk)  almost  surely, 


where  S  =  .3771967...  .  The  minimum  in  (41)  is  achieved  when  k  =  (|^}2/^ 


?7  2  2  1/3 

and  its  value  is  (jp  B n f)  0  (1+o(1)). 


So 


ZIP~ZLP  02/3-a2/3 


'IP 


2/j —  almost  surely . 


Similarly,  the  next  result  can  be  shown  using  the  proof  of  Theorem  8. 


Theorem  1 1 


ne_1  <  f  <  n1_e 


Consider  the  uniform  cost  model  and  assume  that 
for  some  fixed  e  >  0.  Then 


ZIP_ZLP  J2 

— r -  ~  1  -  ~2~  almost  surely. 


'IP 


8.  Conclusion 


The  LP  relaxation  (1)  -  (4)  has  been  widely  used  in  branch  and  bound 
algorithms  for  the  k-median  problem  and  has  been  reported  to  provide  a  tight 
bound  in  practice.  Our  analysis  shows  that  such  good  results  can  indeed  be 
expected  in  a  probabilistic  sense  for  some  problem  instances,  but  we  also 
identify  other  instances  where  the  LP  relaxation  is  almost  surely  not  tight. 
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The  probabilistic  analysis  is  performed  under  four  classical  models  in 
location  theory,  namely  the  Euclidean,  network,  tree  and  uniform  cost 


models . 


For  example,  let  u  =  u(n)  -*  ®.  When  u  <  k  < 


in  the 


ulogn 

Euclidean  model,  z^p/Zjp  =  .99716...  +  o(1)  almost  surely,  and  when 
u  <  k  <  in  the  uniform  cost  model,  zLp/zjp  =  .5  +  o(  1 )  almost  surely. 

Our  computational  experience  confirms  that  large  gaps  occur  frequently  in 
the  uniform  cost  model  whereas  only  small  gaps  were  observed  with  the  other 
models . 

Another  aspect  of  the  probabilistic  analysis  performed  in  Section  2,  3 
and  5  is  that,  under  various  assumptions,  branch  and  bound  algorithms  must 
almost  surely  expand  a  non-polynomial  number  of  nodes  to  solve  k-median 
problems  to  optimality. 


Finally,  we  mention  as  open  problems  the  questions  of  describing  the 

n 


asymptotic  behavior  of  2Lp/zjp  as  n  -*•  ®  when  (i)  k  >  n  in  the 
Euclidean  model,  (ii)  each  edge  of  the  graph  has  a  random  length  dy  (drawn 
uniformly  in  the  interval  [0,1],  say)  in  the  network  and  tree  models,  (iii) 


. ^  —  <  k  <  in  the  uniform  cost  model, 

log  n  e 
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algorithms  that  use  this  relaxation  as  a  bound  must  almost  surely  expand  a  non— poly¬ 
nomial  number  of  nodes  to  solve  the  k-median  problem  of  optimality.  Finally,  we 
report  extensive  computational  experiments.  As  predicted  by  the  probabilistic 
analysis,  the  relaxation  was  not  as  tight  for  the  problem  instances  drawn  from 
the  uniform  cost  model  as  for  the  the  other  models. 
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